-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port eamxx rrtmgp to kokkos #2820
Conversation
…rface_to_kokkos * origin/master: (150 commits) EAMxx: added comment, and use override keyword in io diags unit test classes EAMxx: minor fixes to output manager EAMxx: add calls to init_timestep to IO unit tests EAMxx: rephrased comment in output manager EAMxx: only init timestep in IO if this will be an output step EAMxx: allow diags in IO to perform some ops at beginning of a timestep Change DP unit test Do not load PG2 phis git Call setup_io_info correctly if PG2 fields exist edit banner message EAMxx: minor fixes/clarifications in the model output docs EAMxx: add more beef to model output docs mam4xx - Replace KOKKOS_INLINE_FUNCTION with inline for routines that are only invoked by the host. correct comment string, use function call in shoc EAMxx: rework model output docs page EAMxx: small changes to model input docs EAMxx: automatically change var dtype inside io interfaces EAMxx: use 'if constexpr' in a scorpio utility EAMxx: manually delete iop in AD finalize, to avoid hanging files EAMxx: make IOP use directly the scorpio interfaces ...
@@ -630,6 +804,9 @@ void RRTMGPRadiation::run_impl (const double dt) { | |||
|
|||
// Create YAKL arrays. RRTMGP expects YAKL arrays with styleFortran, i.e., data has ncol | |||
// as the fastest index. For this reason we must copy the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original comment is wrong. The semantics of these subview_Nd functions have a big impact on behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have written the comment myself. I think I wasn't referring to the moment when we do the subview, but rather to the fact that we have all those manual copies down below, from kokkos views to yakl arrays. Stuff like
p_lay(i+1,k+1) = d_pmid(icol,k);
t_lay(i+1,k+1) = d_tmid(icol,k);
z_del(i+1,k+1) = d_dz(i,k);
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the comment is not very clear about that. Perhaps we could rephrase, and maybe move it down near the place where we do the data copy...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the buffer variables are there to be the YAKL layout left copy of Kokkos field views. This process controls all the data in the buffer, so no data is coming from anywhere else.
I think the tough part with the comment is that some of the buffer arrays are copied from input fields before running rrtmgp_main(), and some buffer arrays are used as output from rrtmgp_main() and then copied to field views. I think we'll need at least some comment here as like "these are the yakl fields for rrtmgp internal to use" and then comments where input and output data is copied.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that makes sense. So once we have things in a better state, we would just pass d_pmid , or a subview of it, directly to rrtmgp.
components/eamxx/CMakeLists.txt
Outdated
@@ -189,6 +189,9 @@ if (SCREAM_DEBUG) | |||
endif () | |||
endif() | |||
|
|||
set(DEFAULT_RRTMGP_ENABLE_YAKL TRUE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are hardcoded, and never change, you could prob just use TRUE/FALSE in the option(...)
lines down below...
components/eamxx/CMakeLists.txt
Outdated
# to turn on the RRTMGP internal checks here as well, via | ||
# option (RRTMGP_EXPENSIVE_CHECKS "Turn on internal RRTMGP error checking" ${SCREAM_DEBUG}) | ||
# and then adding to scream_config.h: | ||
# #cmakedefine RRTMGP_EXPENSIVE_CHECKS | ||
option (SCREAM_RRTMGP_DEBUG "Turn on extra debug checks in RRTMGP" ${SCREAM_DEBUG}) | ||
|
||
option(SCREAM_RRTMGP_ENABLE_YAKL "Use YAKL under rrtmgp" ${DEFAULT_RRTMGP_ENABLE_YAKL}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering: is there a reason not to move all RRTMGP cmake configurations (new as well as existing ones) down in the src/physics/rrtmgp folder? It seems like a more appropriate place, and helps to keep our main CMakeLists.txt more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see advantages to both. The current approach allows developers to find all cache var options in one place.
@@ -394,19 +546,18 @@ void RRTMGPRadiation::initialize_impl(const RunType /* run_type */) { | |||
m_do_subcol_sampling = m_params.get<bool>("do_subcol_sampling",true); | |||
|
|||
// Initialize yakl | |||
yakl_init(); | |||
init_kls(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is init_kls
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initializes all enabled kernel launchers, either yakl, kokkos, or both.
@@ -630,6 +804,9 @@ void RRTMGPRadiation::run_impl (const double dt) { | |||
|
|||
// Create YAKL arrays. RRTMGP expects YAKL arrays with styleFortran, i.e., data has ncol | |||
// as the fastest index. For this reason we must copy the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have written the comment myself. I think I wasn't referring to the moment when we do the subview, but rather to the fact that we have all those manual copies down below, from kokkos views to yakl arrays. Stuff like
p_lay(i+1,k+1) = d_pmid(icol,k);
t_lay(i+1,k+1) = d_tmid(icol,k);
z_del(i+1,k+1) = d_dz(i,k);
...
@@ -630,6 +804,9 @@ void RRTMGPRadiation::run_impl (const double dt) { | |||
|
|||
// Create YAKL arrays. RRTMGP expects YAKL arrays with styleFortran, i.e., data has ncol | |||
// as the fastest index. For this reason we must copy the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the comment is not very clear about that. Perhaps we could rephrase, and maybe move it down near the place where we do the data copy...
auto subview_1dk = [&](const real1dk v) -> real1dk { | ||
real1dk subv(v, std::make_pair(0, ncol)); | ||
#ifdef RRTMGP_ENABLE_YAKL | ||
real1dk rv(v.label(), ncol); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to deep copy here. I was really just taking a subview, so that all the data copy down below could be done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do need a deep copy if YAKL is enabled:
// If YAKL is on, we don't want aliased memory in both the yakl and kokos
// subviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need them deep copied though? What's wrong with an unmanaged one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you're assuming there's never anything in the input view that would be used later? It's possible that's true but I didn't verify and I don't think it hurts anything.
We can't use unmanaged views because RRTMGP is not currently coded to accept those.
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: jgfouca |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
SCREAM_PullRequest_Autotester_Mappy # 5397 FAILED (click to see last 100 lines of console output)
SCREAM_PullRequest_Autotester_Weaver # 5694 FAILED (click to see last 100 lines of console output)
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: jgfouca |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
SCREAM_PullRequest_Autotester_Mappy # 5399 FAILED (click to see last 100 lines of console output)
SCREAM_PullRequest_Autotester_Weaver # 5696 FAILED (click to see last 100 lines of console output)
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: jgfouca |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
SCREAM_PullRequest_Autotester_Mappy # 5401 FAILED (click to see last 100 lines of console output)
SCREAM_PullRequest_Autotester_Weaver # 5698 FAILED (click to see last 100 lines of console output)
|
#ifdef RRTMGP_ENABLE_KOKKOS | ||
m_gas_concs_k.concs = gas_concs_k; | ||
m_gas_concs_k.ncol = orig_ncol_k; | ||
//VALIDATE_KOKKOS(m_gas_concs, m_gas_concs_k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments out of order, see the one below.
components/eamxx/src/physics/rrtmgp/eamxx_rrtmgp_process_interface.cpp
Outdated
Show resolved
Hide resolved
components/eamxx/src/physics/rrtmgp/eamxx_rrtmgp_process_interface.cpp
Outdated
Show resolved
Hide resolved
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: jgfouca |
Status Flag 'Pull Request AutoTester' - Error: Jenkins Jobs - A user has pushed a change to the PR before testing completed. NEW EVENT 'committed', ID C_kwDOCEfuetoAKDIxMDZiOGQ3NGE5YzZhMGYwZTUxOTExMWVjNzcyZTEwNTRkZjNmOTA... The Jenkins Jobs will be shutdown; Testing of this PR must occur again. |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
SCREAM_PullRequest_Autotester_Mappy # 5437 ERROR (click to see last 100 lines of console output)
SCREAM_PullRequest_Autotester_Weaver # 5716 ERROR (click to see last 100 lines of console output)
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: jgfouca |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
SCREAM_PullRequest_Autotester_Mappy # 5438 PASSED (click to see last 100 lines of console output)
SCREAM_PullRequest_Autotester_Weaver # 5717 FAILED (click to see last 100 lines of console output)
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: jgfouca |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, target_sha=5555fa2c22ee08aa2ac6b17c06968c9858d09387, However Inspection must be performed before merge can occur... |
2 similar comments
All Jobs Finished; status = PASSED, target_sha=5555fa2c22ee08aa2ac6b17c06968c9858d09387, However Inspection must be performed before merge can occur... |
All Jobs Finished; status = PASSED, target_sha=5555fa2c22ee08aa2ac6b17c06968c9858d09387, However Inspection must be performed before merge can occur... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read through this, and it looks good to me. Thanks @jgfouca!!!
I will note that a pending PR will get rid about of ~500 lines of code from the impl (as they're purely diagnostic)
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
All Jobs Finished; status = PASSED, target_sha=5555fa2c22ee08aa2ac6b17c06968c9858d09387, However Inspection must be performed before merge can occur... |
Status Flag 'Pull Request AutoTester' - AutoMerge IS ENABLED, but the Label AT: AUTOMERGE is not set. Either set Label AT: AUTOMERGE or manually merge the PR... |
p_lev_k(i,nlay) = d_pint(icol,nlay); | ||
t_lev_k(i,nlay) = d_tint(i,nlay); | ||
|
||
// Note that RRTMGP expects ordering (col,lay,bnd) but the FM keeps things in (col,bnd,lay) order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we change the storage in the FM at some point?
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
Using Repos:
Pull Request Author: jgfouca |
Status Flag 'Pull Request AutoTester' - Error: Jenkins Jobs - A user has closed the PR before testing completed. The Jenkins Jobs will be shutdown |
Status Flag 'Pull Request AutoTester' - Jenkins Testing: 1 or more Jobs FAILED Note: Testing will normally be attempted again in approx. 2 Hrs. If a change to the PR source branch occurs, the testing will be attempted again on next available autotester run. Pull Request Auto Testing has FAILED (click to expand)Build InformationTest Name: SCREAM_PullRequest_Autotester_Mappy
Jenkins Parameters
Build InformationTest Name: SCREAM_PullRequest_Autotester_Weaver
Jenkins Parameters
SCREAM_PullRequest_Autotester_Mappy # 5448 ERROR (click to see last 100 lines of console output)
SCREAM_PullRequest_Autotester_Weaver # 5726 ERROR (click to see last 100 lines of console output)
|
Overall port strategy is very similar to what was done for RRTMGP standalone. Duplication of impls was done at the lower levels, validation is done at the eamxx_rrtmgp_process_interface level.
Change list:
Concerns:
My main concern is with eamxx_rrtmgp_process_interface.cpp and how the buffer memory is wrapped by YAKL arrays. The original author left comments indicating that they thought the various subview_ND functions were copying the data. What is actually happening is they are creating the YAKL equivalent of an unmanaged View, wrapping the provided raw pointer. The data coming in from the buffer may not be layout left but it is being treated as layout left in the yakl arrays.
I also had some concerns with m_gas_concs and the gas_concs temporary. Because subview_ND wraps the original pointer, changes to m_gas_concs will change gas_concs; not sure if that was the intent.
Testing:
Tested TAS standalone with